- Part 1; Introducing R
- Part 2; Data Wrangling
- Part 3; Visualizations and a bit statistics
The complete source code for the webinars and all dependent data, and files can be found on Github.com/uashogeschoolutrecht.
2021-11-07 19:33:56
The complete source code for the webinars and all dependent data, and files can be found on Github.com/uashogeschoolutrecht.
\(Reproducible\ (Open)\ Science =\) \(Reproducible\ Research + Open\ Science\)
Brown, Kaiser & Allison, PNAS, 2018
"…in science, three things matter:
everything else is a distraction."
Gollums lurking about“In one case, a group accidentally used reverse-coded variables, making their conclusions the opposite of what the data supported.”
“In another case, authors received an incomplete dataset because entire categories of data were missed; when corrected, the qualitative conclusions did not change, but the quantitative conclusions changed by a factor of >7”
How would you ‘describe’ the steps of an analysis or creation of a graph when you use GUI* based software?
“You can only do this using code, so it is (basically) impossible in a GUI”
**The file “./Rmd/steps_to_graph_from_excel_file.html” shows you how to do this using the programming language R. In day 3, we will revisit this example.
\(P = Publication\), \(D = Data\), \(C = Code\), \(OAcc = Open\ Access\), \(OSrc = Open\ Source\)
Assume we have the following question: “Which of 4 types of chairs takes the least effort to arise from when seated in?” We have the following setup:
To analyze this experiment statistically, the model would need to include: the rating score as the measured (or dependent) variable, the type of chair as the experimental factor and the subject as the blocking factor
A typical analysis method for this type of randomized block design is a so-called ‘multi-level’ or also called ‘mixed-effects’ or ‘hierarchical’ model. An analysis method much used in clinical or biological scientific practice.
You could also use one-way ANOVA but I will illustrate why this is not a good idea
In the next few slides, I will hopefully convince you of the power of (literate) programming to communicate such an analysis.
Wretenberg, Arborelius & Lindberg, 1993
library(nlme) ergoStool %>% as_tibble()
## # A tibble: 36 × 3 ## effort Type Subject ## <dbl> <fct> <ord> ## 1 12 T1 1 ## 2 15 T2 1 ## 3 12 T3 1 ## 4 10 T4 1 ## 5 10 T1 2 ## 6 14 T2 2 ## 7 13 T3 2 ## 8 12 T4 2 ## 9 7 T1 3 ## 10 14 T2 3 ## # … with 26 more rows
Statistical models (in R) can be specified by a model formula. The left side of the formula is the dependent variable, the right side are the ‘predictors’. Here we include a fixed and a random term to the model (as is common for mixed-effects models)
library(nlme)
ergo_model <- lme( data = ergoStool, # the data to be used for the model fixed = effort ~ Type, # the dependent and fixed effects variables random = ~1 | Subject # random intercepts for Subject variable )
The lme() function is part of the {nlme} package for mixed effects modelling in R
Example reproduced from: Pinheiro and Bates, 2000, Mixed-Effects Models in S and S-PLUS, Springer, New York.
| Value | Std.Error | DF | t-value | p-value | |
|---|---|---|---|---|---|
| (Intercept) | 8.5555556 | 0.5760123 | 24 | 14.853079 | 0.0000000 |
| TypeT2 | 3.8888889 | 0.5186838 | 24 | 7.497610 | 0.0000001 |
| TypeT3 | 2.2222222 | 0.5186838 | 24 | 4.284348 | 0.0002563 |
| TypeT4 | 0.6666667 | 0.5186838 | 24 | 1.285305 | 0.2109512 |
A residual plot shows the ‘residual’ error (‘unexplained variance’) after fitting the model. Under the Normality assumption standardized residuals should:
plot(ergo_model) ## type = 'pearson' (standardized residuals)
odz: Practice what you preach
If you want to reproduce, add-on, falsify or apply your own ideas to this example, you can find the code (and data) in Github.com
In this course, I will show you how to run, use and organize code like this!